NSF PAR Search | NSF Public Access Repository

TOPS: Transition-Based Volatility-Reduced Policy Search

Liangliang, X.; Daoming, L.; Yangchen, P. (November 2022, Lecture notes in computer science)

Melo, S. F.; Fang. F. (Ed.)

Existing risk-averse reinforcement learning approaches still face several challenges, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Transition-based vOlatility-controlled Policy Search (TOPS), a novel algorithm that solves risk-averse problems by learning from transitions. We prove that our algorithm—under the over-parameterized neural network regime—finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient. The convergence rate is comparable to the state-of-the-art risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of TOPS’ performance among existing risk-averse policy search methods.

Full Text Available

Search for: All records